dynamic capacity region
Queue Up Your Regrets: Achieving the Dynamic Capacity Region of Multiplayer Bandits
Consider $N$ cooperative agents such that for $T$ turns, each agent n takes an action $a_{n}$ and receives a stochastic reward $r_{n}\left(a_{1},\ldots,a_{N}\right)$. Agents cannot observe the actions of other agents and do not know even their own reward function. The agents can communicate with their neighbors on a connected graph $G$ with diameter $d\left(G\right)$. We want each agent $n$ to achieve an expected average reward of at least $\lambda_{n}$ over time, for a given quality of service (QoS) vector $\boldsymbol{\lambda}$. A QoS vector $\boldsymbol{\lambda}$ is not necessarily achievable.
Queue Up Your Regrets: Achieving the Dynamic Capacity Region of Multiplayer Bandits
Consider N cooperative agents such that for T turns, each agent n takes an action a_{n} and receives a stochastic reward r_{n}\left(a_{1},\ldots,a_{N}\right) . Agents cannot observe the actions of other agents and do not know even their own reward function. The agents can communicate with their neighbors on a connected graph G with diameter d\left(G\right) . We want each agent n to achieve an expected average reward of at least \lambda_{n} over time, for a given quality of service (QoS) vector \boldsymbol{\lambda} . By giving up on immediate reward, knowing that the other agents will compensate later, agents can improve their achievable capacity region.